Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep archived out of VimPerformanceState #22585

Merged
merged 2 commits into from
Aug 11, 2023

Conversation

kbrock
Copy link
Member

@kbrock kbrock commented Jun 26, 2023

reference:

requires:

extracted:

Overview

We were putting too many archived container ids in the VimPerformanceState records.
This was filling up the database, causing us to process too many records, slowing down metrics capture, and slowing down metrics rollups.

Before

  • Don't take into account whether a record was archived.

After

  • We are no longer putting the ids for records that were archived before our time window in the performance state records

Numbers

ms bytes objects query qry ms rows comments
2,131.6 77,558,748* 1,070,108 13 1,760.5 13,680 rollup-create-vps-before
1,632.5 5,139,458* 393,268 13 1,517.7 174 rollup-create-vps-after
23% 93% 63% 13.8% 99% diff
* Memory usage does not reflect 175,949 freed objects.
* Memory usage does not reflect 59,056 freed objects.
  • this example was chosen because of the high number of archived records
  • this example created an VimPerformanceState record. otherwise, there would have been no change
  • Estimated bytes total changed to 90311091 and 5911237 respectively. This resulted in the same bytes savings
  • The query ms savings is very dependent upon the cache status, and was not stable during the tests. But rows returned does give an indicator that the work on the database was reduced significantly
  • the number of rows returned is the difference between bringing back the archived records for creating the VPS and not.
  • The size of the VPS#state_data went from 167,234 => 1,752 characters. The record is more than 150x smaller.

Outstanding

  • migration/script to prune out existing VimPerformanceStates that contain archived ids

Comment on lines 8 to 10
state = VimPerformanceState.capture(host, capture_time)

expect(state.timestamp).to be_within(1.minute).of(Time.now.beginning_of_hour)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very weird condition (I fully understand that this is the way it works currently) and a future reader will not understand this. I think it would be helpful to have a NOTE comment or something similar describing why now is used for Infra and thus capture_time is ignored. Not sure if that note should live here or in the code or both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

luckily we no longer need to deal with the nuance of infra vs containers


# Soft deletes can capture in the past (keep the timestamp), otherwise capture current state (timestamp = now)
ts = Time.now if ts.nil? || !objs.first.class.respond_to?(:active_for)
ts = ts.utc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, existing rollup_spec.rb fails with

     NoMethodError:
       undefined method `utc' for "2010-04-14T21:00:00Z":String
     # ./app/models/vim_performance_state.rb:64:in `capture'
     # ./app/models/metric/ci_mixin/state_finders.rb:34:in `vim_performance_state_for_ts'
     # ./app/models/metric/ci_mixin/state_finders.rb:82:in `containers_from_vim_performance_state_for_ts'
     # ./app/models/metric/rollup.rb:271:in `rollup_child_metrics'
     # ./app/models/metric/rollup.rb:211:in `block in rollup_hourly'
     # ./app/models/metric/rollup.rb:211:in `each'
     # ./app/models/metric/rollup.rb:211:in `rollup_hourly'
     # ./app/models/metric/ci_mixin/rollup.rb:96:in `block (2 levels) in perf_rollup'
     # /Users/joerafaniello/.gem/ruby/3.0.6/gems/more_core_extensions-4.4.0/lib/more_core_extensions/core_ext/benchmark/realtime_store.rb:20:in `realtime_store'
     # /Users/joerafaniello/.gem/ruby/3.0.6/gems/more_core_extensions-4.4.0/lib/more_core_extensions/core_ext/benchmark/realtime_store.rb:56:in `realtime_block'
     # ./app/models/metric/ci_mixin/rollup.rb:95:in `block in perf_rollup'
     # /Users/joerafaniello/.gem/ruby/3.0.6/gems/more_core_extensions-4.4.0/lib/more_core_extensions/core_ext/benchmark/realtime_store.rb:20:in `realtime_store'
     # /Users/joerafaniello/.gem/ruby/3.0.6/gems/more_core_extensions-4.4.0/lib/more_core_extensions/core_ext/benchmark/realtime_store.rb:62:in `realtime_block'
     # ./app/models/metric/ci_mixin/rollup.rb:82:in `perf_rollup'
     # ./spec/models/metric/ci_mixin/rollup_spec.rb:439:in `block (4 levels) in <main>'
     # /Users/joerafaniello/.gem/ruby/3.0.6/gems/webmock-3.18.1/lib/webmock/rspec.rb:37:in `block (2 levels) in <main>'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's doing container_image.perf_rollup("2010-04-14T21:00:00Z", 'hourly')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea. this is a pisser
think it is a legit issue and not just a spec issue.
looking into this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and fixed a caller and added code to protect this.
This code has been moved into 22589 (and this is built upon that)

FactoryBot.create(:container, :container_image => container_image, :deleted_on => (capture_time - 2.hours))

state = VimPerformanceState.capture(container_image, capture_time)
expect(state.get_assoc(:containers).map(&:to_i).sort).to eq(containers.map(&:id).sort)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expect(state.get_assoc(:containers).map(&:to_i).sort).to eq(containers.map(&:id).sort)
expect(state.get_assoc(:containers)).to match_array(containers)

Copy link
Member Author

@kbrock kbrock Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is comparing to_i to id, so I did need to keep part of that in there.
But I did use match_array and removed the sort calls.

# not deleted is captured
containers = FactoryBot.create_list(:container, 1, :container_image => container_image)

state_data = {:assoc_ids => {:containers => {:on => (containers.map(&:id) + [22, 23, 24, 25, 26, 27])}}}
Copy link
Member

@Fryguy Fryguy Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the random numbers for? If it's to get "other" ids, prefer building the id list from existing records, because a) this doesn't handle multi-region properly where the ids aren't in the 0 ragion and b) there's a chance the existing records could fall into the hardcoded range and fail the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put in a comment.

these represent ids that are no longer valid.
If container.id is within that range, then one of those ids will work, but the others will be thrown away. so there will be no false positives in here.

I could do containers.last.id and build from there? - but that seemed like a bunch of code that will create invalid ids just as easily as this does.

Copy link
Member

@Fryguy Fryguy Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containers.last.id sounds reasonable - we do this in other specs as well. We've hit problems in other places where we fake ids for records that either don't exist or shouldn't exist and hardcoded ids have always eventually caused problems.

@kbrock kbrock force-pushed the capture_vim_performance branch 3 times, most recently from c0d7ee2 to d1fba45 Compare June 26, 2023 17:59
end
end

describe ".active_for" do
Copy link
Member

@Fryguy Fryguy Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a few more cases (looking at my drawing that I shared the other day).

(created/deleted wording is relative to the time range in question (i.e. during means during the time range); blank in deleted means not deleted (i.e. still active))

case created deleted in range
a before before
b before during
c before after
d before
e during during
f during after
g during
h after after
i after

I think we might also need to handle cases where created is nil, which is cases where we're not sure when it was created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem: it is very tricky to define created across all container models.

@Fryguy Do you want to define different active_on scopes for each record?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need created for the first pass as discussed - I think h and i were hard for that reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fryguy Do you want to define different active_on scopes for each record?

Maybe? Do have a list of the record types that are "different"? Or possibly we can normalize a different way - perhaps @agrare has some thoughts here.

@kbrock
Copy link
Member Author

kbrock commented Jun 28, 2023

update:

@kbrock kbrock changed the title Keep archived out of VimPerformanceState [WIP] Keep archived out of VimPerformanceState Jun 28, 2023
@kbrock kbrock added the wip label Jun 28, 2023
@kbrock kbrock force-pushed the capture_vim_performance branch 2 times, most recently from 6d17fb1 to 5688e61 Compare June 29, 2023 01:48
@kbrock
Copy link
Member Author

kbrock commented Jun 29, 2023

update:

  • fixed failing test ( s/&klass/klass/)
  • updated performance numbers in git comment.

update:

  • rubocop in spec: Time.now.utc.beginning_of_hour

@kbrock
Copy link
Member Author

kbrock commented Jun 29, 2023

update:

  • updated specs to better show where the change happened around capture and now fails when you comment out the changed code.
  • fixed cop (re: unused state) - thnx cop for finding that spec was lacking.

VimPerformanceState records only capture active objects and those deleted after this hour's time period
This cuts down on the number of records processed. (by a lot)

Numbers
=======

```
VimPerformanceState.first ; ExtManagementSystem.first ; ContainerImage.first ; Container.first ; nil
bookend(gc:true) { Metric::Rollup.rollup_child_metrics(ContainerImage.find(627), Time.now.utc.iso8601, "historical", :containers) }
```

Numbers run for rollup_child_metrics for a parent object that hasn't been created yet.
If the parent object has been run, then we don't need a capture, and we'll see no difference

|       ms |       bytes |   objects |query |  qry ms |     rows |` comments`
|      ---:|         ---:|       ---:|  ---:|     ---:|      ---:|  ---
|  2,131.6 | 77,558,748* | 1,070,108 |   13 | 1,760.5 |   13,680 |`rollup-create-vps-before`
|  1,632.5 |  5,139,458* |   393,268 |   13 | 1,517.7 |      174 |`rollup-create-vps-after`

* Memory usage does not reflect 175,949 freed objects.
* Memory usage does not reflect 59,056 freed objects.

We've already merged code to reduce the number of records returned from VimPerformanceState#containers
So removing archived from the VimPerformanceState will reduce the size of the query sent to fetch containers.
This will reduce the query ms (simpler/smaller query), but this does not reduce the number of records returned
And VPS#state_data went from 167,234 => 1,752 characters

The savings there is due to the fact that capture does not download all the archived records.

So this commit saves all the id downloads, and the commit from the fetch pr (just merged a few commits back),
reduces all archived records fetch as part of the capture process
@kbrock kbrock changed the title [WIP] Keep archived out of VimPerformanceState Keep archived out of VimPerformanceState Aug 1, 2023
@kbrock
Copy link
Member Author

kbrock commented Aug 1, 2023

update:

  • rebased

UN-WIP: this has reduced quite a bit in complexity

@kbrock kbrock removed the wip label Aug 1, 2023
@miq-bot
Copy link
Member

miq-bot commented Aug 1, 2023

Checked commits kbrock/manageiq@65d0a84~...3ddd7d6 with ruby 2.6.10, rubocop 1.28.2, haml-lint 0.35.0, and yamllint
2 files checked, 0 offenses detected
Everything looks fine. 🍰

return false unless resource.respond_to?(method)

records = resource.send(method)
records = records.not_archived_before(timestamp) if records.try(:klass).respond_to?(:not_archived_before)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only actual changed line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have the try here?
-- @Fryguy

If the association is a virtual attribute, like it is for MiqRegion, then it will return an array that does not have a klass.

else
assoc
end
return false unless resource.respond_to?(method)
Copy link
Member Author

@kbrock kbrock Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a minor change to return something so the parent knows we need to next
(codeclimate this code to be its own method)

@Fryguy Fryguy merged commit 900fb1b into ManageIQ:master Aug 11, 2023
9 checks passed
@Fryguy
Copy link
Member

Fryguy commented Aug 11, 2023

Backported to petrosian in commit 10f15a3.

commit 10f15a3f1414f1210269dd00dcccd464e6cb64d8
Author: Jason Frey <[email protected]>
Date:   Fri Aug 11 15:07:24 2023 -0400

    Merge pull request #22585 from kbrock/capture_vim_performance
    
    Keep archived out of VimPerformanceState
    
    (cherry picked from commit 900fb1bc6d235dc41668dc11cc79f0621429cc4f)

Fryguy added a commit that referenced this pull request Aug 11, 2023
Keep archived out of VimPerformanceState

(cherry picked from commit 900fb1b)
@kbrock kbrock deleted the capture_vim_performance branch August 11, 2023 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants